testingiauh912fandomcom-20200214-history
Measurement -Test-Evaluation
Sakineh Yeganegi Measurement-Test-Evaluation Language testing considers the definition of three concepts ‘test’, ‘measurement’, and ‘evaluation’. These three concepts in testing called term. These include the terms ‘measurement’, ‘test’, and ‘evaluation’, and how these are distinct from each other, different types of measurement scales and their properties, the essential qualities of measures - reliability and validity, and the characteristics of measures that limit our interpretations of test results. The process of measurement is '''described as a set of steps which, if followed in test development, will provide the basis for both reliable test scores and valid test use. The terms ‘measurement’, ‘test’, and ‘evaluation’ are often used synonymously; indeed they may, in practice, refer to the same activity.’ '''What is test? Test, in simple term, is a method of measuring a person’s ability knowledge, or performance in a given domain. (Brown, 2004) Test is a method. It is an instrument – a set of techniques, items, procedures that requires performance on the part of the test-taker. (Brown, 2004) According to Backman, (1996, P:20) test defined based on Caroll (1968) definition of test: A psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.(Carroll 1968: 46). What is measurement? Measurement refers to the process of quantifying the characteristics of individuals according to explicit rules and procedures. Measurement necessitates two requirements: First, there needs to be a set of clear objectives for measuring the attribute or property and Second, the attribute or property must be quantifiable. In education, measurement includes rating, ranking and measuring instruments called test. Rating and ranking involve an evaluative summary of past or present experiences for the purpose of making a final judgment. They are accomplished by the personal opinion and judgment of the teacher or rater. (Jafarpour,2001) According to Backman (1996, PP:18-20) measurement defined as is the process of quantifying the characteristics of persons according to explicit procedures and rules. This definition includes three distinguishing features: quantification, characteristics, and explicit rules and procedures. Quantification Quantification involves the assigning of numbers, and this distinguishes measures from qualitative descriptions such as verbal accounts or nonverbal, visual representations. Non-numerical categories or rankings such as letter grades (‘A, B, C . . .’), or labels (for example, ‘excellent, good, average . . .’) may have the characteristics of measurement, and these are discussed below under ‘properties of measurement scales’).' ' Characteristic '' Physical attributes such as height and weight can be observed directly. In testing, however, we are almost always interested in quantifying mental attributes and abilities, sometimes called traits or constructs, which can only be observed indirectly. These mental attributes include characteristics such as aptitude, intelligence, motivation, field dependence / independence, attitude, native language, fluency in speaking, and achievement in reading comprehension. In testing individual’s ability is a characteristic that measure in test by test-taker. ''Rules and procedures The third distinguishing characteristic of measurement is that quantification must be done according to explicit rules and procedures. That is, the ‘blind’ or haphazard assignment of numbers to characteristics of individuals cannot be regarded as measurement. In order to be considered a measure, an observation of an attribute must be replicable, for other observers, in other contexts and with other individuals. What is Evaluation? The process of gathering information for making decision is called evaluation. Evaluation can be either qualitative or quantitative or both. Qualitative evaluation is based on observations and (non)verbal descriptions such as letters of reference or general impressions is subjective. Evaluation of this sort that is used as feedback to make modifications on optimum ways during a certain process is called formative evaluation. Quantitative evaluation relates to objective information obtained through measurement. Evaluation which involves quantitative information is called summative evaluation that is for the purpose or reporting on the quality of certain process when it has already been completed. (Jafarpour, 2001) According to Bachman (1996,P: 22) evaluation defined as the systematic gathering of information for the purpose of making decisions (Weiss 1972). The probability of making the correct decision in any given situation is a function not only of the ability of the decision maker, but also of the quality of the information upon which the decision is based. Evaluation, therefore, does not necessarily entail testing. By the same token, tests in and of themselves are not evaluative. Tests are often used for pedagogical purposes, either as a means of motivating students to study, or as a means of reviewing material taught, in which case no evaluative decision is made on the basis of the test results. Tests may also be used for purely descriptive purposes. The relationships among measurement, tests, and evaluation are illustrated in the following figure. The relationship between Tests, Evaluation, and Measurement ' ' ' ' ' ' ' ' Essential measurement qualities In interpret the score on a test in order to show the individual’s ability, the score must be reliable and valid. These two qualities are essential to the interpretation and use of measures of language abilities. Reliability According to Bachman (1996) reliability is a quality of test scores, and reliable score or measure would be free from error of measurement. Individuals’ performance may be affected by difference in testing conditions, fatigue, and anxiety and they may thus obtain scores that are inconsistent from one occasion to the next. Validity The most important quality of test use is validity. The decision we make on the basis of test score are meaningful, appropriate and useful in order the test score to be meaningful indicator of especial individual’s ability. A test score that is not reliable can not be valid. So, reliability is a quality of test score themselves, validity is quality of test interpretation and use. Properties of measurement Scales In order to measure at attribute or ability of an individual, we need to determined what set of numbers will provide the best measurement. The set of number used for measurement must be appropriate to the ability or attribute measured and the different way of organizing these set of numbers constitute. Scales of measurement There are four types of measurement scales, nominal, ordinal, interval, and ratio. Nominal scale '' a nominal scale comprises numbers that are used to ‘name’ the classes or categories of a given attribute. That is, we can use numbers as a shorthand code for identifying different categories. We could assign different code numbers to individuals with different native language backgrounds, (for example, Amharic = 1, Arabic = 2, Rengali = ''3, ''Chinese = 4, etc.) and thus create a nominal scale for this attribute. ''Ordinal scale As its name suggests, comprises the numbering of different levels of an attribute that are ordered with respect to each other. The most common example of an ordinal scale is a ranking, in which individuals are ranked ‘first’, ‘second’, ‘third’, and so on, according to some attribute or ability. Interval scale An interval scale is a numbering of different levels in which the distances, or intervals, between the levels are equal. That is, in addition to the ordering that characterizes ordinal scales, interval scales consist of equal distances or intervals between ordered levels. Ratio Scale When a scale consists not only of equidistant points but also has a meaningful zero point, then we refer to it as a ratio scale. If we ask respondents their ages, the difference between any two years would always be the same, and ‘zero’ signifies the absence of age or birth. Hence, a 100-year old person is indeed twice as old as a 50-year old one. Sales figures, quantities purchased and market share are all expressed on a ratio scale. Characteristics that limit measurement As test and test users develop the measure of the test must be best. The interpretation of test scores shows the useful and meaningful of the test. There are characteristics of measures of mental abilities and the limitations these characteristics place on interpretation of test scores. These limitations are two kinds: limitations in specification and limitations in observation and quantification. Limitation in specification In language testing, large number of factors affected the performance of an individual like testing context, type of test tasks required, the time of day, and so on. The most important factor is the individual’s ability measure in test. Limitation in observation and quantification Except those factor affect on performance, there are characteristics of the process of observation and quantifications of test results. These derive from the fact that all measures of mental ability are necessarily indirect, incomplete, imprecise, subjective, and relative. Steps in measurement The development of language tests needs to be based on a logical sequence of procedures linking the putative ability, or construct, to the observed performance. This sequence includes three steps: (1) identifying and defining the construct theoretically; (2) defining the construct operationally, and ''(3) ''establishing procedures for quantifying observations. (Bachman, 1996). ' ' ' ' ' '